1,146 research outputs found

    Adaptive Clustering through Semidefinite Programming

    Full text link
    We analyze the clustering problem through a flexible probabilistic model that aims to identify an optimal partition on the sample X 1 , ..., X n. We perform exact clustering with high probability using a convex semidefinite estimator that interprets as a corrected, relaxed version of K-means. The estimator is analyzed through a non-asymptotic framework and showed to be optimal or near-optimal in recovering the partition. Furthermore, its performances are shown to be adaptive to the problem's effective dimension, as well as to K the unknown number of groups in this partition. We illustrate the method's performances in comparison to other classical clustering algorithms with numerical experiments on simulated data

    Rankin-Cohen brackets on quasimodular forms

    Get PDF
    We give the algebra of quasimodular forms a collection of Rankin-Cohen operators. These operators extend those defined by Cohen on modular forms and, as for modular forms, the first of them provide a Lie structure on quasimodular forms. They also satisfy a ``Leibniz rule'' for the usual derivation. Rankin-Cohen operators are useful for proving arithmetic identities. In particular we give an interpretation of the Chazy equation and explain why such an equation has to exist.Comment: 17 page

    Model Assisted Variable Clustering: Minimax-optimal Recovery and Algorithms

    Get PDF
    Model-based clustering defines population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of G-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a G-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to G-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular K-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we contrast our methods with another popular clustering method, spectral clustering, specialized to variable clustering, and show that ensuring exact cluster recovery via this method requires clusters to have a higher separation, relative to the minimax threshold. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.Comment: Maintext: 38 pages; supplementary information: 37 page

    Enterprise Identity Management – Towards a Decision Support Framework Based on the Balanced Scorecard Approach

    Get PDF
    Enterprise Identity Management Systems (EIdMS) are an IT-based infrastructure that needs to be integrated into various business processes and related infrastructures. Assessment and preparation of decisions for the introduction need to take the costs, benefits, and the organizational settings into consideration. A variety of methods for the evaluation and decision support of new IT (e. g. EIdMS) are discussed in the literature – however, these are typically based on single dimensions (e. g. financial or technology aspects). This paper proposes a multidimensional decision support framework, based on the Balanced Scorecard concept. The presented approach introduces four perspectives and a related set of initial decision parameters to support decision making. The perspectives are (a) financial/monetary, (b) business processes, (c) supporting processes and (ICT) infrastructure and (d) information security, risks and compliance. Perspectives and adaptable sets of decision parameters also may serve as foundation for software-based decision support instruments

    PersLay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures

    Full text link
    Persistence diagrams, the most common descriptors of Topological Data Analysis, encode topological properties of data and have already proved pivotal in many different applications of data science. However, since the (metric) space of persistence diagrams is not Hilbert, they end up being difficult inputs for most Machine Learning techniques. To address this concern, several vectorization methods have been put forward that embed persistence diagrams into either finite-dimensional Euclidean space or (implicit) infinite dimensional Hilbert space with kernels. In this work, we focus on persistence diagrams built on top of graphs. Relying on extended persistence theory and the so-called heat kernel signature, we show how graphs can be encoded by (extended) persistence diagrams in a provably stable way. We then propose a general and versatile framework for learning vectorizations of persistence diagrams, which encompasses most of the vectorization techniques used in the literature. We finally showcase the experimental strength of our setup by achieving competitive scores on classification tasks on real-life graph datasets

    Optimal quantization of the mean measure and applications to statistical learning

    Get PDF
    This paper addresses the case where data come as point sets, or more generally as discrete measures. Our motivation is twofold: first we intend to approximate with a compactly supported measure the mean of the measure generating process, that coincides with the intensity measure in the point process framework, or with the expected persistence diagram in the framework of persistence-based topological data analysis. To this aim we provide two algorithms that we prove almost minimax optimal. Second we build from the estimator of the mean measure a vectorization map, that sends every measure into a finite-dimensional Euclidean space, and investigate its properties through a clustering-oriented lens. In a nutshell, we show that in a mixture of measure generating process, our technique yields a representation in Rk\mathbb{R}^k, for kNk \in \mathbb{N}^* that guarantees a good clustering of the data points with high probability. Interestingly, our results apply in the framework of persistence-based shape classification via the ATOL procedure described in \cite{Royer19}

    Optimal quantization of the mean measure and application to clustering of measures

    Get PDF
    This paper addresses the case where data come as point sets, or more generally as discrete measures. Our motivation is twofold: first we intend to approximate with a compactly supported measure the mean of the measure generating process, that coincides with the intensity measure in the point process framework, or with the expected persistence diagram in the framework of persistence-based topological data analysis. To this aim we provide two algorithms that we prove almost minimax optimal. Second we build from the estimator of the mean measure a vectorization map, that sends every measure into a finite-dimensional Euclidean space, and investigate its properties through a clustering-oriented lens. In a nutshell, we show that in a mixture of measure generating process, our technique yields a representation in Rk\mathbb{R}^k, for kNk \in \mathbb{N}^* that guarantees a good clustering of the data points with high probability. Interestingly, our results apply in the framework of persistence-based shape classification via the ATOL procedure described in \cite{Royer19}

    Les troubles du comportement, la compétence sociale et la pratique d’activités physiques chez les adolescents

    Get PDF
    Les problèmes de comportement constituent une préoccupation importante en milieu scolaire. Un des principaux moyens d’intervention utilisés est l’entraînement aux habiletés sociales. L’efficacité de ce type d’initiatives est modeste en ce qui a trait au transfert, au maintien et à la généralisation de l’apprentissage de nouveaux comportements. L’objectif de cette étude est de comparer les profils d’élèves, avec et sans troubles du comportement, autour de différentes variables associées aux habiletés sociales, à l’adaptation psychosociale, à la pratique d’activités physiques et à certaines habitudes de vie à la santé. En partant des résultats de l’analyse, les auteurs proposent de nouvelles pistes d’intervention dans le cadre de programmes d’entraînement aux habiletés sociales s’adressant aux élèves en difficulté de comportement.Behaviour problems are an important concern in the school milieu. One of the principal means of intervention is that of social skills training. These types of initiatives have resulted in only modest efficacy regarding the transfer, the maintenance and the generalization of learning new behaviours. The object of this study is to compare profiles of students with and without behaviour problems with regards to different variables associated with social skills, to psychosocial adaptation, to the practice of physical activities, and to certain healthy life habits. From an analysis of the results, the authors propose new directions for intervention within the frame of social skill training programs for students with behaviour difficulties.Los problemas de comportamiento representan una preocupación importante en el ámbito escolar. Uno de los principales medios de intervención utilizados es la incitación a las habilidades sociales. La eficacia de este tipo de iniciativas es poco significante referente a la transferencia, al mantenimiento y a la generalización del aprendizaje de comportamientos nuevos. Este estudio tiene por objetivo comparar los perfiles de alumnos, con y sin trastornos de comportamiento, en torno a distintas variables asociadas a las habilidades sociales, a la adaptación psicosocial, a la práctica de actividades físicas y a ciertos hábitos de vida relacionados con la salud. A partir de los resultados del análisis, los autores proponen nuevas pistas de intervención a través de programas de entrenamiento a las habilidades sociales, dirigidos a alumnos con problemas de comportamiento

    Le fort et les poudrières du complexe militaire de l'Île Sainte-Hélène

    Get PDF
    Construit vers 1820, le complexe militaire de l'île Sainte-Hélène comprenait des remparts, un arsenal, des magasins, une petite et une grande poudrières, une caserne et divers bâtiments pour desservir la garnison. Notre recherche, basée sur des données archéologiques, historiques et architecturales, est consacrée à l'analyse des liens qu'ont entretenu deux fonctions du complexe, soit l'entreposage et la défense. Nous avons établi que le fort a été conçu en tenant compte principalement des besoins d'entreposage: c'est pourquoi il a été construit près du quai et à une altitude similaire à celui-ci. Une telle disposition du fort en contrebas du mont Wolf a réduit sa valeur défensive, l'ennemi pouvant débarquer sur la rive ouest et attaquer le fort à partir du mont. Par contre, nous avons constaté que les poudrières répondent aux normes et démontrent un bon agencement des besoins d'entreposage et de défense
    corecore